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Abstract We describe a new approach allowing for 
systematic causal attribution of weather and climate- 
related events, in near-real time. The method is pur¬ 
posely designed to facilitate its implementation at me¬ 
teorological centers by relying on data treatments that 
are routinely performed when numerically forecasting 
the weather. Namely, we show that causal attribution 
can be obtained as a by-product of so-called data as- 


A. Hannart 

IFAECI, CNRS-CONICET-UBA 

Pab. II, piso 2, Ciudad Universitaria 

1428 Buenos Aires, Argentina 

Tel.: -1-5411-4787-2693 

Fax: -1-5411-4788-3572 

E-mail: alexis.hannart@cima.fcen.uba.ar 

A. Carrassi 

Mohn-Sverdrup Center, Nansen Environmental and Remote 
Sensing Center, Bergen, Norway 

M. Bocquet 

CEREA, Ecole des Fonts, Marne-la-Vallee, France 
M. Ghil 

Ecole Normale Superieure, Paris, France 
University of California, Los Angeles, USA 

P. Naveau 

LSCE, CNRS, Gif-sur-Yvette, France 
M. Pulido 

Dept, of Physics, Universidad Nacional del Nordeste, Corri- 
entes, Argentina 

J. Ruiz 

IFAECI, CNRS/CONICET/UBA, Buenos Aires, Argentina 
P. Tandeo 

Telecom Bretagne, Brest, France 


similation procedures that are run on a daily basis 
to update the meteorological model with new atmo¬ 
spheric observations; hence, the proposed methodology 
can take advantage of the powerful computational and 
observational capacity of weather forecasting centers. 
We explain the theoretical rationale of this approach 
and sketch the most prominent features of a “data as¬ 
similation based detection and attribution” (DADA) 
procedure. The proposal is illustrated in the context 
of the classical three-variable Lorenz model with ad¬ 
ditional forcing. Several theoretical and practical re¬ 
search questions that need to be addressed to make the 
proposal readily operational within weather forecasting 
centers are finally laid out. 

Keywords Event attribution • Data assimilation • 
Causality theory • Modified Lorenz model 

1 Background and motivation 

A significant and growing part of climate research stud¬ 
ies the causal links between climate forcings and ob¬ 
served responses. This part has been consolidated into 
a research topic known as detection and attribution 
(D&A). The D&A community has increasingly been 
faced with the challenge of generating causal informa¬ 
tion about episodes of extreme weather or unusual cli¬ 
mate conditions. This challenge arises from the needs 
for public dissemination, litigation in a legal context, 
adaptation to climate change or simply improvement 
of the science associated with these events (Stott et ah, 
2015). 

The approach widely used so far to in D&A was in¬ 
troduced one decade ago by M.R. Allen and colleagues 
(Allen, 2003; Stone and Allen, 2005) and it originates 
from best practices in epidemiology (Greenland and 
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Rothman, 1998). In this approach, one evaluates the 
extent to which a given external climate forcing — 
such as solar irradiation, greenhouse gas (GHG) emis¬ 
sions, ozone or aerosol concentrations — has changed 
the probability of occurrence of an event of interest. 

For this purpose, one compares the probability of 
occurrence of said event in an ensemble of model sim¬ 
ulations representing the observed climatic conditions, 
which simulates the actual occurrence probability in the 
real world, with the occurrence probability of the same 
event in a parallel ensemble of model simulations, which 
represent an alternative world. The former world is re¬ 
ferred to as factual, the latter as counterfactual: it is 
the latter that might have occurred had the forcing of 
interest been absent. 

Denoting by pi and po the probabilities of the event 
occurring in the factual world and in the counterfactual 
world respectively, the so-called fraction of attributable 
risk (FAR) is then defined as FAR= 1—po/Pi- The FAR 
has long been interpreted as the fraction of the likeli¬ 
hood of an event which is attributable to the external 
forcing. Over the past decade, most causal claims have 
been following from the FAR and its uncertainty, re¬ 
sulting in statements such as “It is very likely that over 
half the risk of European summer temperature anoma¬ 
lies exeeeding a threshold of 1.6° C is attributable to hu¬ 
man influence.” (Stott et ah, 2004). 

Hannart et al. (2015) have recently shown that, un¬ 
der realistic assumptions, the FAR may also be inter¬ 
preted as the so-called probability of necessary causation 
(PN) associated — in a complete and self-consistent 
theory of causality (Pearl, 2000) — with the causal link 
between the forcing and the event. The FAR thus cor¬ 
responds to only one of the two facets of causality in 
such a theory, while the probability of sufficient causa¬ 
tion (PS) is its second facet. 

In this setting, 


PN = 1 - —, 

Pi 

(la) 

PS = 1 - , 

1 -Po 

(lb) 

PNS =pi-po, 

(Ic) 


where PNS is the probability of necessary and sufficient 
causation. 

Pearl (2000) provides rigorous definitions of these 
three concepts, as well as a detailed discussion of their 
meanings and implications. It can be seen from Eqs. 
(1) that causal attribution requires to evaluate the two 
probabilities, po and pi, and not just one of them. Doing 
so is, therefore, the central methodological question of 
D&A for weather and climate-related events. 


So far, most case studies have used large ensembles 
of climate model simulations in order to estimate pi and 
Po based on a variety of methods, in particular based 
on statistical extreme value theory (EVT). However, 
this general approach has a very high computational 
cost and is difficult to implement in a timely and sys¬ 
tematic way. As recognized by Stott et al. (2015), this 
remains an open problem: “the overarching challenge 
for the community is to move beyond research-mode 
case studies and to develop systems that can deliver 
regular, reliable and timely assessments in the after- 
math of notable weather and climate-related events, 
typically in the weeks or months following (and not 
many years later as is the case with some research¬ 
mode studies)”. Eor instance, the weatherOhome system 
(Massey et ah, 2014), or the system proposed by Ghris- 
tidis et al. (2013), aim at meeting those requirements 
within the conventional ensemble-based approach. On¬ 
going research aiming towards the development of such 
a system also include the GASCADE project (Cali¬ 
brated and Systematic Characterization, Attribution 
and Detection of Extremes, U.S. Department of Energy, 
Regional and Global Climate Modeling program). 

The purpose of this article is to introduce a new 
methodological approach that addresses the latter over¬ 
arching operational challenge. Our proposal relies on 
a class of powerful statistical methods for interfacing 
high-dimensional models with large observational datasets. 
This class of methods originates from the field of weather 
forecasting and is referred to as data assimilation (DA) 
(Bengtsson et ah, 1981; Ghil and Malanotte-Rizzoli, 
1991; Talagrand, 1997). 

Section 2 explains the rationale of the approach pro¬ 
posed herein, presents a brief overview of DA, and out¬ 
lines the most prominent technical features of a “data 
assimilation-based detection and attribution” (DADA) 
approach. Section 3 illustrates the proposal by imple¬ 
menting it on a version of the classical Lorenz convec¬ 
tion model (Lorenz, 1963, L63 hereafter) subject to an 
additional constant force. Finally, in Section 4, we dis¬ 
cuss the main strengths and limitations of the DADA 
approach, and highlight several theoretical and prac¬ 
tical research questions that need to be addressed to 
make it potentially operational within weather forecast¬ 
ing centers in a near future. 


2 Method description 

2.1 General rationale 

The rationale for addressing causal attribution of climate- 
related events based on DA concepts and methods can 
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be outlined in three steps. To do so briefly and clearly, 
we need to introduce some notation. 

Let Yt denote the d-dimensional vector of observa¬ 
tions at discrete times {t = 0,1,..., T}. Here, y = {y* : 
0 < t < r} corresponds, for instance, to the full set 
of all available meteorological observations over a time 
interval covering the event of interest, no matter the 
diversity and source of the data; typically, the latter in¬ 
clude ground station networks, satellite measurements, 
ship data, and so on, cf. (Bengtsson et ah, 1981, Preface, 
Fig. 1) or (Ghil and Malanotte-Rizzoli, 1991, Fig. 1). 
In the present probabilistic D&A context, the observed 
trajectory y is viewed as a realization of a random vari¬ 
able denoted Y = {Yt ■ 0 < t < T}, i.e. there exists 
an a; € 17 such that Y (w) = y — where ^2 denotes the 
sample space of all possible outcomes and encompasses 
observational error, as well as internal variability. 

In event attribution studies, it is recognized that 
defining the occurrence of an event, i.e. selecting a sub¬ 
set C 17, depends on a rather arbitrary choice. Yet 
this choice has been shown to greatly affect causal con¬ 
clusions (Hannart et ah, 2015). For instance, a generic 
and fairly loose event definition is arguably prone to 
yield a low threshold of evidence with respect to both 
necessary and sufficient causality while, on the other 
hand, a tighter and more specific event dehnition is 
prone to yield a stringent threshold for necessary causal¬ 
ity but a reduced one for sufficient causality. 

Indeed, it is quite intuitive that many different fac¬ 
tors should usually be necessary to trigger the occur¬ 
rence of a highly specific event and conversely, that no 
single factor will ever hold as a sufficient explanation 
thereof. For the class of unusual events at stake in D&A, 
where both pq and pi are very small, we arguably lean 
towards specific definitions that inherently result in few 
sufficient causal factors or none. This conclusion imme¬ 
diately follows from Eq. (lb), which yields PS ~ 0 when 
both Pq and pi are very small. 

Usually, an event occurrence is defined in D&A based 
on an ad hoc scalar index (j){Y) exceeding a threshold 
u, i.e. Pi = P(0(Y) > u); from now on, we associate 
i = 0 with the counterfactual and i = 1 with the fac¬ 
tual world. While this definition may be already quite 
restrictive for u large, it is a defensible strategy to re¬ 
strict the event definition even further: this may slightly 
reduce an already negligible PS but in return may po¬ 
tentially increase PN by a greater amount; one thus 
expects to gain more than one loses in this trade-off. In 
particular, this will be the case if additional features, 
not accounted for in (f>{Y), can be identified that will al¬ 
low one to further discriminate between the two worlds. 

In any case, a central element of our proposal is 
to follow this strategy in its simplest possible form, by 


using the tightest occurrence definition i.e. the singleton 
{w € 17 I Y(w) = y}. Note that the latter singleton has 
probability zero in both worlds because the probability 
density function (PDF) /(Y(w)) of Y can be assumed, 
in general, to be continuous, i.e. to contain no singular 
^-functions. 

Consider, however, the paradox that arises from tak¬ 
ing the limit h —> 0 for the set {w G 17 | || Y (w)-y|| < 
h}. This set has non-zero probability for h arbitrarily 
small but positive while, in the limit, 

PN=1-^^^, PS = 0, (2) 

/i(y) 


where fi denotes the PDF of Y in world i. Equation (2) 
thus shows that, while the probabilities of occurrence 
of our singleton event in both worlds are null, its asso¬ 
ciated probability of necessary causation is still positive 
— but its probability of sufficient causation is always 
zero. Our proposal thus intentionally sacrifices evidence 
of sufficiency, in the hope of maximizing the evidence 
of necessity. 

Our betting on the singleton set is thus justifiable 
already based on the above theoretical considerations. 
This choice, moreover, is motivated by having a highly 
simplifying implication from a practical standpoint. Eval¬ 
uating the PDF of Y at a single point Y = y is in¬ 
deed, under many circumstances, considerably easier 
than evaluating the probability P(^(Y) > u) required 
in the conventional approach. 

To illustrate this point, let Y be for instance a d- 
variate autoregressive process defined by Yj+i = AYj-l- 
wt, where wt is an i.i.d. noise having known PDF g(-) 
and where A has the usual properties that insure sta- 
tionarity (Gardiner, 2004). We then have: 


T 

/(y) = n “ Ayt_i)7r(yo), 

T 

P{<p{Y) >u)=( nff(yt - Ayt_i) 

J<j>(y)>u 

X 7r(yo)dj/qo . •. dy^.o • ■ • dyd,T , 


(3a) 

(3b) 


with 7r(-) the prior PDF on the initial state Yq. Equa¬ 
tion (3a) shows that /(y) can be easily computed us¬ 
ing a closed-form expression, while P{(j){Y) > u) in 
Eq. (3b) is an integral on d x T 1 dimensions which 
must instead be evaluated by using, for instance, a com¬ 
putationally quite costly Monte-Garlo (MG) simulation. 

Figure 1 illustrates this situation by showing the 
details of the latter MG evaluation for a scalar AR(I) 
process (panel a, when based on a standard EVT appli¬ 
cation, as well as its associated accuracy (panels b and 
c), and the computational cost as the MG sample size 
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n varies (panel d); the latter cost is much larger than 
the one of applying the DADA approach. This simple 
example confirms the large computational discrepancy 
between the two approaches. 

The reason for the discrepancy is quite simple: eval¬ 
uating the conventional probability requires integrating 
a PDF over a predefined domain, instead of a one-off 
evaluation at a single point. Because both the domain 
of integration and the PDF may have potentially com¬ 
plex shapes, one cannot expect, in general, that the 
requisite integral be amenable to analytical treatment. 
Hence numerical integration is the default option: no 
matter how efficient an integration scheme one applies, 
it will require evaluating the PDF at many points and is 
thus as many times more costly computationally than 
just evaluating /(y) at a single point. 

This being said, it is not always straightforward to 
obtain the PDF of Y. This is the case, for instance, for 
the wide class of statistical models referred to as Hidden 
Markov Models (HMMs); in fact, HMMs [e.g., (Ihler et 
al., 2007, and references therein)] are often relevant in 
the present context to describe Y. 

More precisely, assume that the event of interest 
can be represented by a large numerical model which 
A^-dimensional state vector at time t is denoted X(. The 
dynamics of the state vector is given by: 

Xi+i =M(X*,F*)+Vi, (4) 

where M is the model operator, Vt is a stochastic term 
representing modeling error, and is a known, pre¬ 
scribed forcing that is external to the model. In the 
present context, it is precisely the forcing term F = 
(Fi)^g that is under causal scrutiny. Further, assume 
that our observations Yj can be mapped to the state 
vector Xt at any time t, i.e. 

Yi=H(Xt)+wt (5) 

where H is the so-called observation or forward operator 
and Wj is a stochastic term representing observational 
error. 

Denoting by F^*) the value of the forcing in the world 
i, using the shorthand Mi{xt) = M(x(, Fj ) and denot¬ 
ing by Aii the HMM associated with H and M^, the 
problem of interest here is thus to derive: 

/o(y) = /(y I Mo) and /i(y) = /(y | Mi) , (6) 

where /o(y) and /i(y) should be interpreted as the like¬ 
lihoods of the observation y in the counterfactual and 
factual models, respectively. 

Finally getting to our point, one can view DA meth¬ 
ods as a class of inference methods designed for the 


above HMM setting. Actually, Ihler et al. (2007) al¬ 
ready formulated both DA and HMMs within the broader 
class of graphical models for statistical inference. 

While inferring the unknown state vector trajectory 
X, given the observed trajectory y, is clearly the main 
focus of DA, the likelihood /(y) can also be obtained 
as a side product thereof, as we will immediately clarify 
below. Therefore, with DA able to derive the two like¬ 
lihoods /o(y) and /i(y), and the latter two being the 
keys to causal attribution in our approach, one should 
be capable of moving towards near-real-time, system¬ 
atic causal attribution of weather- and climate-related 
events. 

2.2 Brief overview of data assimilation 

DA was initially developed in the context of numerical 
weather forecasting, in order to initialize the model’s 
state variables X based on observations y that are in¬ 
complete, diverse in nature, unevenly distributed in space 
and time, do not necessarily match the model’s state 
variables, and are contaminated by measurement error 
(Bengtsson et al., 1981; Talagrand, 1997). Over the past 
decades, those methods have grown out of their origi¬ 
nal application field to reach a wide variety of topics in 
geophysics such as oceanography (Ghil and Malanotte- 
Rizzoli, 1991), atmospheric chemistry, geomagnetism, 
hydrology, and space physics, among many other areas 
(Robert et ah, 2006; Cosme et al., 2010; Kondrashov et 
al., 2011; Bocquet, 2012; Martin et al., 2014). 

DA is already playing an increasing role in the cli¬ 
mate sciences, having being applied, for instance, to 
initialize a climate model for seasonal or decadal pre¬ 
diction (Balmaseda et ah, 2009), to constrain a climate 
model’s parameters (Kondrashov et ah, 2008; Ruiz et 
al., 2013), to infer carbon cycle fluxes from atmospheric 
concentrations (Chevallier, 2013), or to reconstruct pa- 
leoclimatic fields out of sparse and indirect observations 
(Bhend et ah, 2012; Roques et ah, 2014). In the con¬ 
text of D&A, Lee et al. (2008) actually tested a DA-like 
approach to include the effects of the various forcings 
over the last millennium, in addition to other paleocli- 
mate proxy data, in combined climate reconstruction 
and detection analysis. The present work thus follows 
and further strengthens a general trend in climate stud¬ 
ies. 

Methodologically speaking, DA methods are tradi¬ 
tionally grouped into two categories: sequential and vari¬ 
ational (Ide et al., 1997, and references therein). In 
the sequential approach (Ghil et al., 1981), the state 
estimate and a suitable estimate of the associated er¬ 
ror covariance matrix are propagated in time until new 
observations become available and are used to update 
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the state estimate. In practice, the evolution of the 
system of interest is retrieved — like in earlier, typi¬ 
cally much smaller-dimensional applications (Kalman, 
1960; Jazwinski, 1970; Gelb, 1974) — through a se¬ 
quence of prediction and analysis steps. In the varia¬ 
tional approach, on the other hand, one seeks the sys¬ 
tem trajectory that best fits all the observations dis¬ 
tributed within a given time interval (Le Dimet and 
Talagrand, 1986; Ide et ah, 1997; Bocquet, 2012). Here, 
we concentrate on the sequential approach, but the 
two approaches are complementary and the choice of 
method depends on the specihcs of the problem at hand 
(Ghil and Malanotte-Rizzoli, 1991; Ide et ah, 1997; Ta¬ 
lagrand, 1997). 

Abundant literature is available on DA and on Kalman- 
type Hlters. Kalman (1960) first presented the solution 
in discrete time for the case in which both the dynamic 
evolution operator M in Eq. 4 and the observation oper¬ 
ator H in Eq. 5 are linear, and the errors are Gaussian. 
Under these assumptions, the state-estimation problem 
for the system given by Eqs. (5, 4) has an exact solution 
given by the following sequential Kalman filter (KF) 
equations: 


= xf-bK(yt-Hx{), (7a) 

P“ = (I-KH)P{, (7b) 

xf+i = Mx“ , (7c) 

Pf+i = MP“M' + Q . (7d) 


where ' denotes the transpose operation. Here Eqs. (7a) 
and (7b) are referred to as the analysis step and de¬ 
noted by a superscript a, while the forecast step is 
given by Eqs. (7c) and (7d), and is denoted by a super¬ 
script / (Ide et ah, 1997). The vector x“ and the matrix 
P“ are the mean and covariance of Xt conditional on 
(Yi,...,Y*) = (yi,...,yt); K = P/H'(HPfH'+ R)-i 
is the so-called Kalman gain matrix; while Q and R are 
the covariances associated with Vf and wt, respectively. 
Following Wiener (1949), one distinguishes between/il- 
tering, in which x“ and P“ are conditioned only on 
the previous and current observations (yo,...,yt), and 
smoothing, in which they are conditioned on the entire 
sequence, 0 < t < T. Furthermore, the sequential algo¬ 
rithm needs to be initialized at time t = 0 with Xq and 
Pg, which thus represent the a priori mean and covari¬ 
ance of Xg, respectively, and have to be prescribed by 
the user. 

The likelihood function /(y), which is of primary 
importance for DADA, also has an exact expression 
under the above linearity and Gaussianity assumptions 


(Tandeo et ah, 2014), given by: 

T 

/(y)=n(27r)-^|S*|-i 

i=0 (8) 

X exp |-i(y( - Hxf)'St"^(yt - Hx{)| , 

with Xt = HP{H' -|- R. The proof of Eq. (8) is pro¬ 
vided in the Appendix, and /(y) is typically computed 
by taking the logarithm of this equation to turn the 
product on the right-hand side into a sum. 

It follows from the above that, once the observations 
yt have been assimilated on the interval 0 < t <T, the 
necessary ingredients x( and P^ in Eq. 8 are available 
and thus calculating /(y) is both straightforward and 
computationally inexpensive. The fundamental connec¬ 
tions between this calculation, the HMM context, and 
Bayes theorem are further clarified in the Appendix. 

Many difficulties arise in applying the simple ideas 
outlined here to geophysical models, which are typically 
nonlinear, have non-Gaussian errors and are huge in 
size (Ghil and Malanotte-Rizzoli, 1991). Most of these 
difficulties have been addressed by improving both se¬ 
quential and variational methods in several ingenious 
ways (Bocquet et ah, 2010; Kondrashov et ah, 2011). 

In particular, the Ensemble Kalman Filter (EnKF; 
Evensen, 2003)— in which the uncertainty propagation 
is evaluated by using a finite-size ensemble of trajec¬ 
tories — is now operational in numerical weather and 
oceanic prediction centers worldwide; see e.g. Sakov et 
al. (2013); Houtekamer et al. (2014). The EnKF is a 
convenient approximate solution to the filtering prob¬ 
lem in a nonlinear, large-dimensional context. We sim¬ 
ply note here that it can also be applied to obtain an ap¬ 
proximation of the likelihood /(y) by substituting the 
approximate sequence {(xf,P{) : t = 0, ...,T} that 
the EnKF produces into Eq. 8. This strategy is illus¬ 
trated immediately below in the context of the L63 con¬ 
vection model subject to an additional constant force. 


3 Implementation within the modified L63 
model 

3.1 The modihed model and its two worlds 

A simple modification (Palmer, 1999) of the L63 sys¬ 
tem (Lorenz, 1963) has been extensively used for the 
purpose of illustrating methodological developments in 
both DA and D&A [e.g. (Carrassi and Vannitsem, 2010; 
Stone and Allen, 2005)]. In the nonlinear, coupled sys¬ 
tem of three ordinary differential equations (ODEs) for 
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X, y and z below, 
dx 

= (j{y - x) + COS 9i , 

O') 

dj/ , . . dz „ ^ ' 

— = px - y - xz + , — = xy - j3z 

the time-constant forcing terms in the x- and y-equation 
represent, in fact, an addition to the forcing hidden in 
the original L63 model. The latter forcing is revealed by 
a well-known linear change of variables, in which x and 
y are left unchanged and z ^ z + p + a (Lorenz, 1963). 
In the new variables, the model of Eq. (9) will take the 
canonical form of a forced-dissipative system (Ghil and 
Childress, 1987, Sec. 5.4), with an extra forcing term 
—/3(p -I- a) in the z-equation, just like the original L63 
model. 

Here Xi is the intensity of the additional forcing and 
0i is its direction in world i = 0,1: i.e., Aq = 0 repre¬ 
sents a counterfactual world with no additional forcing, 
while Al 7 ^ 0. We take the parameters (cr, p, /3) to equal 
their usual values (10,28,8/3) that yield the well-known 
chaotic behavior, and the (nondimensional) time unit t 
is interpreted as equaling days. 

The ODE system given by (9) is discretized by using 
At = 0.01 and t refers hereafter to the number of time 
increments At. This system is then turned into one of 
stochastic difference equations [SZ\Es: Arnold (2003); 
Chekroun et al. (2011)] by adding an error term vj 
assumed to be Gaussian and centered with covariance 
Q = (Tq I, where I is the 3x3 identity matrix. Further¬ 
more, we assume that all three coordinates (x,y,z) of 
the state vector are observed, i.e. that H = I, and that 
the measurement error term Wj is also Gaussian and 
centered, with covariance Ti. = a]^l. Recalling the no¬ 
tation introduced in Sec 2a, we associate a label lo G f2 
with each realization of the pair of random processes 
(vt, wt) that drive the model given by Eq. (9) and per¬ 
turb its observations, respectively. 

The SZ\E system defined above is stationary, i.e. 
the PDF of the state vector xt depends neither on t 
nor on xg after a sufficiently long time t. This PDF can 
be obtained as the (numerical) solution of the Fokker- 
Planck equation associated with Eq. (9), and it is the 
mean over [2 of the sample measures obtained for each 
realization oj of the noises Vj and wt (Ghekroun et al., 
2011, and references therein). Each sample measure is 
supported on a random attractor that may have very 
fine structure and be time-dependent (Ghekroun et al., 
2011, Figs. 1-3 and supplementary material), but the 
PDF is supported smoothly, in the counterfactual world 
in which Aq = 0 , on a “thickened” version of the fairly 
well-known strange attractor of the original L63 model. 

In the factual world in which Ai 7 ^ 0, the nature 
of the PDF is quite similar, but its exact shape is af¬ 


fected by the parameters (Ai, 0i) of the forcing. In both 
worlds, the PDFs can be estimated, for instance, by us¬ 
ing kernel density estimation applied to ensembles of 
simulations obtained for either forcing. In Figs. 2a,b, we 
plot the projections of both PDFs onto the plane associ¬ 
ated with the greatest variance in the factual PDF. The 
difference between the two PDFs is shown in Fig. 2c; 
it emphasizes the existence of an area of the state space 
(represented in white), which is more likely to be reached 
in the factual world than in the counterfactual one. 

Next, we define an event to occur for the sequence 
{yt : t = 0,... ,T} if the scalar product 0'yt between 
the unit vector (j) in the direction (j) and yt, i.e. the pro¬ 
jection of yt onto the direction <j), exceeds u for some 
0 < t < T, where ^ is a specified direction and u is 
a threshold chosen based on (j) so that px = 0.01. Fig¬ 
ure 2d shows a selection of sequences from both worlds 
in which an event did occur, where (p was chosen to be 
the leading direction in the projection plane. 

For this choice of p, the trajectories associated with 
event occurrence happen to all lie in the area of the 
state space which is more likely to be reached in the 
factual world than in the counterfactual one. Accord¬ 
ingly, the probability of the event in the former is found 
to be higher than in the latter, i.e. pi > po, and the oc¬ 
currence of an event {max{o<t<T} ^ is thereby 
informative from a causal perspective, i.e. the associ¬ 
ated probabilities of necessary and sufficient causation 
are positive. 

Figure 2d also shows that the trajectories associ¬ 
ated with the event in the two worlds — counterfactual 
(green) and factual (red) — appear to have slightly dis¬ 
tinct features: the red trajectories are shifted towards 
higher values in the second direction, of highest-but-one 
variance. Such distinctions might help discriminate fur¬ 
ther between the two worlds in the DADA framework. 

3.2 DADA for the modified L63 model 

The DADA procedure is illustrated in Fig. 3. We plot in 
panel (a) a trajectory of the state vector Xj simulated 
under factual conditions, i.e. in the presence of the addi¬ 
tional forcing (black solid line), along with the observa¬ 
tions {yt '■ 0 < t < T} (gray dots), with T = 400. The 
EnKF is used to assimilate these observations into a 
factual model (1 = 1) that thus matches the true model 
M = Ml = M(Ai,0i) used for the simulation: a recon¬ 
structed trajectory is obtained from the corresponding 
analyses x/ (red solid line in panel (a)), cf. Eqs. (7), 
and the likelihoods /i(yt) (red solid line in panel (c)) 
are obtained by application of Eq. (8), respectively. 

Next, the assimilation is repeated in the counter- 
factual model {i = 0, i.e. A = 0) to obtain a second 
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analysis of the trajectory, from the same observations; 
see green solid line in panel (a), for T = 400. The corre¬ 
sponding likelihoods /o(yt) are shown in panel (c) as a 
green solid line. Comparing the trajectories of the two 
analyses in Fig. 3a shows that, even though the coun- 
terfactual analysis (green line) uses the same data as 
the factual analysis (red line), the former lies closer to 
the true trajectory (black line). 

The local discrepancies between the trajectories es¬ 
timated in the two worlds appear to be rather small 
at first glance, cf. panel (a), and so are the instanta¬ 
neous differences between the associated factors on the 
right-hand side of Eq. (8); the latter are shown as gray 
rectangles in panel (c) of the figure. Still, the evidence 
in favor of the factual world accumulates as the time t 
over which the two trajectories differ, albeit by a small 
amount, lengthens. This cumulative difference in evi¬ 
dence, log/o(y0 - log/i(yi)> is reflected by a growing 
gap between the two curves, red and green, in panel 
(c), and by an associated high mean growth over time 
of the probability PN of necessary causation, cf. the 
black solid line in panel (d). 

In order to evaluate more systematically its perfor¬ 
mance and robustness compared to the conventional 
FAR approach, the DADA procedure was applied to 
a large sample of sequences yt of length T = 20 sim¬ 
ulated under diverse conditions. The sample explored 
all possible combinations of the triplet of parameters 
(Ai, CTQ, (Jr), with ten equidistributed values each, for a 
total of 10^ combinations; the ranges were 0 < Ai < 40, 
0.1 < <JQ < 0.5 and 0.1 < <Jr < 1.0, respectively, with 
9i = —140°. For each combination of {Xi,(Jq,(Jr), ten 
directions </> were randomly generated and u was de¬ 
fined based on </> as in Sec. 3a above, so as to achieve 
Pi > 0.01. 

In order to estimate the corresponding conventional 
probabilities po and pi of the associated event defined 
as {max{g<i< 7 ’} (p'yt > m}, n = 50 000 sequences y* 
of length T = 20 were simulated, by using a single se¬ 
quence of length nT = 10® and splitting it into n equal 
segments. Probabilities po and pi were then directly 
estimated from empirical frequencies because the high 
value of n here did not require the use of the EVT ex¬ 
trapolation normally used for smaller n. 

Eor each quintuplet of parameter values 
(Ai, (TQ, CT/j; ()), m), one hundred sequences of observa¬ 
tions {yt : 0,..., T = 20} were generated with a pro¬ 
portion pi/{pi + Po) being simulated from the factual 
world and a proportion Po/{pi + Po) from the counter- 
factual one. All sequences were treated with the DADA 
procedure — by applying DA to the synthetic observa¬ 
tions according to Eqs. (7a)-(7d) — and then Eq. (8) 
to obtain /o(y) and /i(y) from the reconstructed tra¬ 


jectories. The a priori mean and covariance Xq and Pq 
required as inputs to the DADA procedure were those 
associated with the PDF of the attractor, given the forc¬ 
ing conditions (Ai G [0,40], 6*1 = —140°) assumed for 
each assimilation experiment. As a result, two prob¬ 
abilities PN of necessity are finally obtained for each 
sequence yt, PNp = 1 —polpi for the conventional ap¬ 
proach and PN f = 1 — /o (y)//i (y) for the DADA ap¬ 
proach. 

We next wish to evaluate under various conditions 
how well the two probabilities PNp and PN f perform 
with respect to discriminating between the factual and 
counterfactual forcings. Consider a simple discrimina¬ 
tion rule whereby a trajectory yt is identified as factual 
for PN exceeding a given threshold, and as counterfac¬ 
tual otherwise. The so-called receiver operating charac¬ 
teristic (ROC) curve plots the rate of true positives as 
a function of the rate of false positives obtained when 
varying the threshold in a binary classification scheme 
from 0 to 1 ; it thus gives an overall visual representation 
of the skill of our PN as a discriminative score. 

The Gini (1921) index G was originally introduced 
as a measure of statistical dispersion intended to sum¬ 
marize the information contained in the Lorenz (1905) 
curve that represents the income distribution of a na¬ 
tion’s residents; G may be viewed, though, more gen¬ 
erally as a metric summarizing the dispersion of any 
smooth curve that starts at the origin and ends at the 
point ( 1 , 1 ) with respect to the diagonal of the corre¬ 
sponding square. In particular, we use G here to sum¬ 
marize into a single scalar the ROC curve, which ranges 
from 0 for random discrimination to 1 for perfect dis¬ 
crimination. 

Figure 4a shows ROC curves obtained over the en¬ 
tire sample of n = 50 000 sequences: they correspond to 
G = 0.35 for the conventional method and to G = 0.82 
for the DADA method, i.e. the overall performance gap 
is more than twofold. As expected, the performance of 
both methods is nil for Ai = 0 and it is very sensitive 
to the intensity of the forcing, cf. Fig. 4b. 

Furthermore, the skill of the DADA method is boosted 
when decreasing the level of model error, cf. Fig. 4c; this 
is an expected result, since DA becomes more reliable 
when the model is more accurate, and when it is known 
to be so. Ultimately, under perfect model conditions, 
i.e. as CTQ —)■ 0, DADA reaches perfect discriminative 
power, with G—> 1, no matter how small, but still posi¬ 
tive, the forcing is; see Fig. 4d. On the other hand, the 
level of observational error gr appears to have but a 
limited effect on DADA performance for the range of 
values considered, cf. Fig. 4e. 

Finally, Fig. 4f shows that both methods perform 
better when the contrast between po and pi is strong. 
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but the latter does not influence the gap between the 
two methods, which remains nearly constant. This con¬ 
stant gap thus appears to quantify the additional power 
resulting from the extra discriminative features that the 
PDF /(y) is able to capture on top of those associated 
with the probability P{4>{y) > u). 

4 Discussion and conclusions 

Hannart et al. (2015) have relied on the causality the¬ 
ory of Pearl (2000) to show that the ratio between 
the factual evidence /i(y) and the counterfactual evi¬ 
dence /o (y) is important in studying causal attribution 
of weather- and climate-related events. In this paper, 
we first described data assimilation (DA) methods and 
then demonstrated that they are well suited for deriv¬ 
ing /o(y) and /i(y) from trajectories in the factual and 
the counterfactual worlds, respectively. Besides, these 
methods offer the key practical advantage of being al¬ 
ready up-and-running in near real time at meteorolog¬ 
ical centers. 

Combining these two sets of considerations, theo¬ 
retical and practical, opens a novel route towards near 
real time, systematic causal attribution of weather- and 
climate-related events, thereby addressing a key chal¬ 
lenge in the field of detection and attribution (D&A) 
at present (Stott et al., 2015). 

4.1 Theoretical considerations 

Implementing the DA for D&A (DADA) approach in 
the context of the L63 model in Section 3 allowed for a 
detailed step-by-step illustration of our methodological 
proposal. It also provided a basic test for an initial per¬ 
formance assessment, which showed an improved level 
of discriminating power with respect to the conven¬ 
tional approach outlined in Section 1. These results 
are promising, and their promise is easy to understand, 
given the fact that the DADA approach leverages the 
available information on the entire trajectory y, as op¬ 
posed to the single specihc feature l<^(y)>« in the con¬ 
ventional approach. 

It is important, though, to stress that the term “per¬ 
formance” here should be considered with caution: im¬ 
proving discriminatory performance may or may not be 
a desirable outcome, depending on the causal question 
being asked. Hannart et al. (2015) have shown that the 
causal question being formulated reflects the subjective 
interests of a particular class of end-users, and that the 
formulation itself may dramatically affect the answer. 

For example, the question “did anthropogenic CO 2 
emissions cause the heatwave observed over Argentina 


during January 2014?” has been traditionally treated 
by dehning a “heatwave” in terms of a predehned tem¬ 
perature index reaching a predefined threshold, i.e., by 
a singular index exceeding a singular threshold. This 
class of questions matters for instance in the context of 
insurance disbursements, where a hnancial compensa¬ 
tion may typically be triggered based on such an index 
exceedance. In this situation, the additional discrimina¬ 
tory power of DADA is meaningless because the DADA 
computation does not address the question at stake: 
there is simply no alternative to computing the proba¬ 
bilities po and Pi of the index exceeding the threshold. 

However, if the question is formulated instead as 
“did anthropogenic CO 2 emissions cause the atmospheric 
conditions observed over Argentina during January 2014 ?” 
— i.e., without specifying which feature of the observed 
sequence is most important — then improving discrim¬ 
ination makes perfect sense and DADA becomes fully 
relevant. Furthermore, DADA is still fully relevant even 
if the question is formulated more specifically as “did 
anthropogenic CO 2 emissions cause the damages gen¬ 
erated in Argentina by the atmospheric conditions of 
January 2014?” provided that is, that a model relat¬ 
ing atmospheric observations to damages at every time 
step t along the trajectory of the physical model used 
in the assimilation is available and can be integrated 
into the observation operator H. 

On the other hand, the results of Section 3 should 
also be considered with caution simply because the L63 
testbed obviously differs in many respects from the real 
situation envisioned for future applications, both in terms 
of model dimension n and observation dimension d: in 
practice n will be very large and d n, while here we 
took d = n = 3. 

In particular, choosing a highly idealized, climato¬ 
logical a priori distribution on the initial condition 7r(xo) 
does not raise any difficulty under the tested conditions 
nor does it influence significantly the outcome of the 
procedure (not shown). The choice of 7r(xo), however, 
may be an important problem in practice, when d ^ n, 
and lead to potentially spurious results. 

As a consequence, it may be both necessary and use¬ 
ful to further constrain the so-called background PDF 
7 r(xo) by using the forecasts originating from r previous 
assimilation cycles, thus following the ideas of lagged- 
averaged forecasting (Hoffman and Kalnay, 1983; Dalcher 
et ah, 1988). The evidence thus obtained, though, will 
then also depend on previous observations over the “ini¬ 
tialization” window [—T,..., — 1] — i.e., it will no longer 
represent exclusively the desired evidence /(y). Besides, 
choosing r optimally to constrain the initial background 
PDF in a satisfactory manner, while at the same time 
limiting the latter unwanted dependence on previous 
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observations, is a challenging question that needs to be 
adressed. 

More generally, the problem of evaluating the ev¬ 
idence /(y) is not new in the HMM and DA litera¬ 
ture; see, for instance, Baum et al. (1970); Hiirzeler and 
Kiinsch (2001); Pitt (2002) and Kantas et al. (2009). 
Various algorithms are thus available to carry out this 
evaluation, depending on a number of key assumptions 
— such as lack of Gaussianity or linearity — and on the 
inferential setting chosen, e.g. particle filtering. These 
algorithms may provide accurate and effective solutions 
to the above problem, as well as improved alternatives 
to the Gaussian and linear approximation of Eq. (8), 
since the latter may not be sufficiently accurate for suc- 
cesfully implementing the DADA approach under real¬ 
istic conditions. 

4.2 Practical considerations 

While we have shown here that the proposal of using 
DADA for event attributions has intellectual merit, its 
main strength lies, in our view, in down-to-earth cost 
considerations. By design, the DADA approach allows 
one to piggyback at a low marginal cost on the large 
and powerful infrastructures already in place at sev¬ 
eral meteorological centers, in terms of both hardware 
and personnel. These centers are capable of process¬ 
ing massive amounts of observational data with high- 
throughput pipelines on the world’s largest computa¬ 
tional platforms, as opposed to requiring the design, 
set-up and maintenance of a new and large, D&A-specific 
infrastructure to collect observations and generate — 
under near real time constraints — the many model 
simulations required by the conventional approach re¬ 
called in Section 1. 

Taking a step back, it is useful to examine our pro¬ 
posal within the wider context of the emergence of so- 
called climate services. It is widely recognized that ex¬ 
tending the scope of activity of meteorological centers 
from being “monoline” weather forecasting providers 
to becoming “multiline” climate services providers - en¬ 
compassing, for instance, weather forecasting and weather 
event attribution as two service lines among several oth¬ 
ers - is a relevant strategic option (Hewitt et ah, 2012). 
Such a strategy may foster the timely and cost-efficient 
emergence of the latter services by building upon tech¬ 
nological and infrastructure synergies with the former. 
For these reasons, our proposal is particularly relevant 
for, and could contribute to, the implementation of the 
strategic option just outlined. 

This being said, DADA can very well serve as a 
method for near real time event attribution even for hy¬ 
pothetical climate services providers that focus uniquely 


or mainly on longer time scales, beyond a month, a sea¬ 
son or a year. In such a context, DADA may allow for 
the assimilation of a broader range of observations, and 
in particular of ocean observations; it may, in fact, be 
important to include the latter in causal analysis when 
the event occurrence under scrutiny is defined over a 
sufficiently large time window. 
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Appendix: Derivation of the model evidence 

In this appendix, we outline the derivation of model 
evidence within a general Bayesian framework, and we 
apply the latter to the narrower KF context to obtain 
Eq. (8). Gonsider two consecutive cycles of a DA run, 
the first with state vector Xj and observation vector yt 
at instant t and the subsequent one with state vector 
Xt_|_i and observation vector yt+i at instant t + 1. We 
plan to find a tractable expression for the model evi¬ 
dence p(y(,yt+i). 

The model evidence provided by the full sequence of 
observations y = (yo, ■■■,yT) will be inferred by recur¬ 
sion, using the results of this two-observation setting. 
In order to decouple the two cycles, one first has to spell 
out the Bayesian inferencep(yt, yt+i) = piyt)piyt+i\yt) 
We look for a tractable expression for p{yt+i |yt) by fur¬ 
ther introducing the states Xj+i and x^ as intermediate 
random variables: 

p(yt+i|yt) = / p{yt+i\yt,xt+i)p{xt+i\yt)dxt+i 

= / p{yt+i\^t+i) 

-'Xt + i 

X {y p(xt+i|xt)p(xt|y() dx(| dx^+i , 

( 10 ) 

where p(y(+i|xt_|_i) is the likelihood of the observation 
vector yt+i conditional on the state vector x^+i and it 
is known from Eq. (5). 

The conditional PDF p{xt\yt) of x* on yt at instant 
t — which appears on the right-hand side of the above 
equation — is referred to as the analysis PDF in the DA 
literature, where it is denoted by a superscript a (Ide 
et ah, 1997), and it constitutes the main DA output. 
The integral /^^p(xt+i|xt)p(xt|y() dx^ =p{xt+i\yt), in 
which p(xi_|_i|xt) is known from the model dynamics 
given by Eq. (4), propagates this analysis PDF further 
in time, to instant t -I- 1. Hence, the result of this in¬ 
tegration coincides with the forecast PDF, denoted by 
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superscript / in the DA literature (Ide et al., 1997). 
It follows that this decomposition is tractable using a 
DA scheme that is able to estimate the conditional and 
forecast PDFs. 

Next, let us apply the general Bayesian inference 
(10) to the case in which all the PDFs involved are 
Gaussian; this requires, in turn, that both the dynam¬ 
ics and observation models M and H be linear, and 
that the input statistics all be Gaussian. In this case, 
the Kalman filter allows for the exact computation of 
the PDFs mentioned in Eq. (10), which turn out to be 
Gaussian. 

In the following, A/’(x, P) designates the Gaussian 
PDF of mean x and covariance matrix P. In this con¬ 
text, the analysis PDF at instant t is A/'(x“, P“), where 
x“ and P“ are the analysis state and error covariance 
matrix at instant t. As a result of the linearity assump¬ 
tions, the forecast PDF at instant t -I- I is given by a 
Gaussian distribution A/'(xf_|_j^, Pf^j^), where xf_|_j^ and 

r 

P(_i_]^ are the forecast state and error covariance ma¬ 
trix at instant t+1. Further, the integration on Xj+i in 
Eq. (10) can readily be performed under these circum¬ 
stances, with the outcome that p{yt+i |yt) is distributed 
as A7(Hxf+i, R -f HPf+i H'). 

The desired model evidence /(y) can then be com¬ 
puted by recursion on successive time steps as: 


T 

/(y) =p{yo) J|(27r)-5|S(|-5 

( 11 ) 

xexp{-i(yt-Hx/)'S7-^(yt-Hxf)} ; 

here p{yo) represents the prior PDE of the initial state, 
= R -I- HP{H', and This expression coincides with 
Eq. (8) and can be evaluated with the help of any DA 
method that yields the forecast states and forecast error 
covariance matrices, such as the KE or the EnKE. Note 
that the traditional standard Kalman smoother would 
give the same result as the KE, since they share the 
same forecasts. 

Einally, Eqs. (10) and (11) above show that the like¬ 
lihood /(y) may be obtained as a by-product of the in¬ 
ference on the state vector x, which usually is the main 
purpose in numerical weather prediction. This idea may 
actually be highlighted in even greater generality by 
considering the equality: 


/(y) = 


pjy I x)p(x) 

p(x I y) 


( 12 ) 


While Eq. (12) is a direct consequence of Bayes theo¬ 
rem, it also illustrates a point that is arguably not so 
intuitive. The likelihood /(y) is obtained here as the ra¬ 
tio of two quantities: a numerator p(y | x)p(x) that is 


a model premise inherently postulated by Eqs. (5) and 
(4), and a denominator p(x | y) that may be viewed as 
the end result of the primary inference on x. In other 
words, estimating /(y) requires only a straightforward 
division, provided x has been previously inferred. 

Equation (12) thus expresses with great clarity and 
simplicity a fundamental idea buttressing our proposal, 
as it provides a general theoretical justification for the 
suggestion of deriving the likelihood from an inferential 
treatment that focuses on x. To put it succintly, this 
equation basically says, “He who can do more can do 
less. ” In the context of DA, whose end purpose is to 
infer the state vector x out of an observation y — i.e., 
the more part — it is possible to obtain the likelihood 
as a by-product thereof— i.e., the less part — and thus 
almost for free. 
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Fig. 1 Illustration of the conventional D&A approach as ap¬ 
plied to a univariate AR(1) process, (a) Observed time se¬ 
ries (first component Yi, dotted line) and daily average (('(Y) 
(heavy solid line), (b) Threshold level (vertical axis) as a func¬ 
tion of the return period (horizontal axis): simulated values 
(crosses); fit based on the Generalized Pareto distribution 
(GPD, heavy dark-blue line); uncertainty range at the 95% 
level (light blue area); and threshold value u = 3.1 (light solid 
black line), (c) Estimated value of P = P((()(Y) > u) (heavy 
dark-blue line) using a GPD fit as a function of the sample 
size n (horizontal axis); uncertainty range (light blue area); 
and true value P = 0.01 (light solid black line), (d) Computa¬ 
tional time on a desktop computer (seconds, vertical axis) as 
a function of sample size n (horizontal axis) required by the 
conventional method (dark blue line) and the DADA method 
(solid red line); the latter method is explained in Sections 2b 
and 3 below. 
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Fig. 2 Two-dimensional (2-D) projections of the PDF of the 
modified L63 model; the projection is onto a plane defined by 
the two leading eigenvectors of the factual PDF shown in the 
first panel, (a) PDF of the factual attractor, with Ai = 20 and 
(jQ = 0.1; and (b) PDF of the counterfactual attractor, with 
Aq = 0. (c) Difference between the factual and counterfac¬ 
tual PDFs, (d) Sample trajectories associated with an event 
occurrence originating from the factual (red solid lines) and 
counterfactual worlds (green solid lines); the vertical dashed 
line in all four panels indicates the threshold u with respect 
to the horizontal axis of largest variance in the factual PDF. 
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- X® counterfactual - factual -x* • y“ 



cumulative counterfactual on [0,t] 

- cumulative factual on [0,t] 

□ difference at instant t 



Fig. 3 Sample trajectories from data assimilation (DA) in 
our modified L63 model, (a) True trajectory (black solid line) 
and the two trajectories reconstructed by DA in the factual 
(i = 1) and counterfactual [i = 0) worlds (red and green solid 
lines), respectively, over a long sequence, T = 400; the values 
of Ai and 6\ here are the same as in Fig. 2, and the assimilated 
observations are shown as gray dots, (b) Same as panel (a) 
but zoomed over a short sequence, T = 20. (c) Logarithm 
of the cumulative evidences /i(y) and /o(y) (red and green 
lines, respectively) computed over the window [0, t < T]; gray 
bars indicate the instantaneous differences between /i(yt) 
and /o(yt). (d) PN computed over the window [0,t]. 
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DADA - conventional 


Fig. 4 Performance of the DADA and conventional methods 
(red vs. blue solid lines, respectively), (a) Receiver operating 
characteristic (ROC) curve: true positive rate as a function 
of false positive rate, when varying the cut-off level u, as ob¬ 
tained from the entire sample of n = 50 000 sequences; see 
text for details., (b) Gini index G as a function of forcing 
intensity Ai. (c) Same as (b) for several values of ctq and for 
DADA only, with the black arrow indicating the direction of 
growing (Jq . (d) Same as (b) but as a function of model error 
amplitude cfq. (e) Same as (b) but as a function of observa¬ 
tional error amplitude cru. (f) Same as (b) as a function of 
the logarithmic contrast between the conventional probabili¬ 
ties logpi/po- 
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